Debugging incidents in Google's distributed systems
نویسندگان
چکیده
منابع مشابه
D3S: Debugging Deployed Distributed Systems
Testing large-scale distributed systems is a challenge, because some errors manifest themselves only after a distributed sequence of events that involves machine and network failures. DS is a checker that allows developers to specify predicates on distributed properties of a deployed system, and that checks these predicates while the system is running. When DS finds a problem it produces the se...
متن کاملLive debugging of distributed systems
Debugging distributed systems is challenging. Although incremental debugging during development finds some bugs, developers are rarely able to fully test their systems under realistic operating conditions prior to deployment. While deploying a system exposes it to realistic conditions, debugging requires the developer to: (i) detect a bug, (ii) gather the system state necessary for diagnosis, a...
متن کاملDebugging Distributed Systems with Causal Nets
Modern computing is typically distributed as data are stored on systems that are connected to form huge networks and computations take place on several locations. Moreover, emergent metaphors for programming modern networks (aka overlay computers) aim to dynamic composition of distributed computational units. For instance, the service oriented computing paradigm promotes the composition of dist...
متن کاملA Debugging Tool for Distributed Systems
This paper describes parts of the design of a debugger for a distributed real-time multimedia system. Emphasis lies on the distributed aspect of debugging, which means that attention is paid to the extemal behaviour of the processes. This type of debugging is useful to find communication or synchronization errors. However, experience learns that this is not enough: the debugger must also provid...
متن کاملDistributed Systems Debugging – State of the Art
Software engineers have to face many problems when creating, testing, and debugging their applications. Even a small modification of a distributed system can considerably change its behavior. Today’s programs in distributed and embedded systems are often designed for long-time running applications and, thus, very complex. It is unlikely that all mistakes in such applications are eliminated duri...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Communications of the ACM
سال: 2020
ISSN: 0001-0782,1557-7317
DOI: 10.1145/3397880